3. Results¶
3.1. Search Grid With 5-fold Cross Validation¶
The model was fine tuned by finding optimal values for the following hyperparmeters:1) L2 regularization and 2) dropout rate applied to each of the hidden layers, and 3) learning_rate.
Optimal hyperparameters were found by doing a full grid-search over 75 different combinations of the three hyperparameteres, and cross-validating each combination with a 5-fold cross-validation method. A total of 75 models were trained and validated. Following plot shows the performance of each model in terms of its training and validation accuracies. The models are shown in a descending order their mean validation accuracy. The error bars indicate standard deviation across folds.
Thirty-ninth model was the best performing model with mean training and validation accuracy of 0.89 and 0.82, respectively, and was chosen as the final model to be tested on the held out test set. Final model hyperparameters were: 1) L2 = 0.003, dropout = 0.3, and learning_rate = 0.001
3.2. Test Accuracy¶
The trained model was tested on near-miss segments of the 19 held-out participants. Following figure shows temporal and overall accuracy on the held-out participants. The model performs resonably well from the 1st timepoint (TP) itself, with a mean accuracy of 0.8. The mean accuracy steadily increases to 0.89 by the 7th TP. “Overall” accuracy is the mean accuracy across TP, which is 0.83.
3.3. Chance Accuracy¶
Model with the best hyperparameters was trained on the training set with randomly shuffled labels a hundred times. After every training iteration, the model was tested on a validation set with “non-shuffled” (i.e., true) labels. This process was meant to simulate a distribution for chance accuracy. The mean of the chance accuracy distribution formed the baseline performance measure against the observed performance of the model when it was trained on the training set with true labels. The observed accuracy was significantly greater than the average chance arrucary (p < 0.009).
Accuracy
Observed: 0.83
Chance: 0.50
Observed > Chance (p = 0.0099)
3.4. Temporal Trajectories¶
GRU outputs hidden states that are typically high dimensional. Hidden states (\(h_{t}\)) capture spatio-temporal variance that is most useful in maintaning class separability. To visualize dynamics, \(h_{t}\) was linearly projected onto a lower (3D) dimensional space, \(\hat{h_{t}}\). This was done by replacing the output layer with a Dimensionality Reduction Dense Layer (DRDL) with three linear units. In essence, this is a supervised non-linear dimensional reduction step.
The 3-dimensional representations of \(h_{t}\) (\(\hat{h_{t}}\)) for both stimulus class are plotted along the three axes of the coordinate system. The plot represents the temporal trajectories of the two classes. At the first timepoint the two classes are closest to each other. Distance between them increases with every timepoint. Next plot showns the Euclidean distance between the two classes as a function of time.